Goto

Collaborating Authors

 pattern analysis and machine intelligence


Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

Jiang, Kai, Huang, Siqi, Chen, Xiangyu, Shao, Jiawei, Zhang, Hongyuan, Li, Xuelong

arXiv.org Artificial Intelligence

Continual learning in visual understanding aims to deal with catastrophic forgetting in Multimodal Large Language Models (MLLMs). MLLMs deployed on devices have to continuously adapt to dynamic scenarios in downstream tasks, such as variations in background and perspective, to effectively perform complex visual tasks. To this end, we construct a multimodal visual understanding dataset (MSVQA) encompassing four different scenarios and perspectives including high altitude, underwater, low altitude and indoor, to investigate the catastrophic forgetting in MLLMs under the dynamics of scenario shifts in real-world data streams. Furthermore, we propose mUltimodal coNtInual learning with MLLMs From multi-scenarIo pERspectives (UNIFIER) to address visual discrepancies while learning different scenarios. Specifically, it decouples the visual information from different scenarios into distinct branches within each vision block and projects them into the same feature space. A consistency constraint is imposed on the features of each branch to maintain the stability of visual representations across scenarios. Extensive experiments on the MSVQA dataset demonstrate that UNIFIER effectively alleviates forgetting of cross-scenario tasks and achieves knowledge accumulation within the same scenario.


Graph Matching via Multiplicative Update Algorithm

Bo Jiang, Jin Tang, Chris Ding, Yihong Gong, Bin Luo

Neural Information Processing Systems

As a fundamental problem in computer vision, graph matching problem can usually be formulated as a Quadratic Programming (QP) problem with doubly stochastic and discrete (integer) constraints. Since it is NP-hard, approximate algorithms are required. In this paper, we present a new algorithm, called Multiplicative Update Graph Matching (MPGM), that develops a multiplicative update technique to solve the QP matching problem. MPGM has three main benefits: (1) theoretically, MPGM solves the general QP problem with doubly stochastic constraint naturally whose convergence and KKT optimality are guaranteed.




Generalizing Graph Matching beyond Quadratic Assignment Model

Tianshu Yu, Junchi Yan, Yilin Wang, Wei Liu, baoxin Li

Neural Information Processing Systems

In this paper, we show that a large family of functions, defined as Separable Functions, can asymptotically approximate the discrete matching problem by varying the approximation controlling parameters.



Unsupervised Foreground Extraction via Deep Region Competition Peiyu Y u

Neural Information Processing Systems

We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition [1], a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.


Robust Contrastive Multi-view Clustering against Dual Noisy Correspondence

Neural Information Processing Systems

Recently, contrastive multi-view clustering (MvC) has emerged as a promising avenue for analyzing data from heterogeneous sources, typically leveraging the off-the-shelf instances as positives and randomly sampled ones as negatives. In practice, however, this paradigm would unavoidably suffer from the Dual Noisy Correspondence (DNC) problem, where noise compromises the constructions of both positive and negative pairs.